{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "R2AxpObRncMd"
      },
      "source": [
        "***Copyright 2020 The TensorFlow Authors.***"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "gQ5Kfh1YnkFS"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "uc0VwsT5nvQi"
      },
      "source": [
        "# TensorFlow Lattice を使った倫理のための形状制約"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "gqJQZdvfn32j"
      },
      "source": [
        "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
        "  <td><a target=\"_blank\" href=\"https://www.tensorflow.org/lattice/tutorials/shape_constraints_for_ethics\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\"> TensorFlow.orgで表示</a></td>\n",
        "  <td><a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/ja/lattice/tutorials/shape_constraints_for_ethics.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\">Google Colab で実行</a></td>\n",
        "  <td><a target=\"_blank\" href=\"https://github.com/tensorflow/docs-l10n/blob/master/site/ja/lattice/tutorials/shape_constraints_for_ethics.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\">GitHub でソースを表示{</a></td>\n",
        "  <td><a href=\"https://storage.googleapis.com/tensorflow_docs/docs-l10n/site/ja/lattice/tutorials/shape_constraints_for_ethics.ipynb\"><img src=\"https://www.tensorflow.org/images/download_logo_32px.png\">ノートブックをダウンロード/a0}</a></td>\n",
        "</table>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YFZbuZMAoBny"
      },
      "source": [
        "## 概要\n",
        "\n",
        "このチュートリアルでは、TensorFlow Lattice（TFL）ライブラリを使用して、モデルが*責任を持って*振る舞い、ある*倫理的*または*平等な*想定に反しないようにトレーニングします。具体的には、単調性制約を使用して、特定の属性が*不平等にペナルティ付け*されないようにすることに焦点を当てます。このチュートリアルには、Serena Wang と Maya Gupta が執筆し、[AISTATS 2020](https://www.aistats.org/) で発表された論文「[*Deontological Ethics By Monotonicity Shape Constraints*](https://arxiv.org/abs/2001.11990)」の実験の実演が含まれます。\n",
        "\n",
        "パブリックデータセットで TFL 缶詰 Estimator を使用しますが、このチュートリアルに記載されている内容は、TFL Keras レイヤーから構築されたモデルでも実行できます。\n",
        "\n",
        "続行する前に、ランタイムに必要なすべてのパッケージがインストールされていることを確認してください（以下のコードセルでインポートされるとおりに行います）。"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "o4L76T-NpgCS"
      },
      "source": [
        "## セットアップ"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6FvmHcqbpkL7"
      },
      "source": [
        "TF Lattice パッケージをインストールします。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "f91yvUt_peYs"
      },
      "outputs": [],
      "source": [
        "#@test {\"skip\": true}\n",
        "!pip install tensorflow-lattice seaborn"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6TDoQsvSpmfx"
      },
      "source": [
        "必要なパッケージをインポートします。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "KGt0pm0b1O5X"
      },
      "outputs": [],
      "source": [
        "import tensorflow as tf\n",
        "\n",
        "import logging\n",
        "import matplotlib.pyplot as plt\n",
        "import numpy as np\n",
        "import os\n",
        "import pandas as pd\n",
        "import seaborn as sns\n",
        "from sklearn.model_selection import train_test_split\n",
        "import sys\n",
        "import tensorflow_lattice as tfl\n",
        "logging.disable(sys.maxsize)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DFN6GOcBAqzv"
      },
      "source": [
        "このチュートリアルで使用されるデフォルト値です。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "9uqMM2joAnoW"
      },
      "outputs": [],
      "source": [
        "# List of learning rate hyperparameters to try.\n",
        "# For a longer list of reasonable hyperparameters, try [0.001, 0.01, 0.1].\n",
        "LEARNING_RATES = [0.01]\n",
        "# Default number of training epochs and batch sizes.\n",
        "NUM_EPOCHS = 1000\n",
        "BATCH_SIZE = 1000\n",
        "# Directory containing dataset files.\n",
        "DATA_DIR = 'https://raw.githubusercontent.com/serenalwang/shape_constraints_for_ethics/master'"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "OZJQfJvY3ibC"
      },
      "source": [
        "# 事例 1: 法科大学院への進学適正\n",
        "\n",
        "このチュートリアルの前半では、法科大学院入学審議会（LSAC）の Law School Admissions データセットを使った事例を考察します。学生の LSAT スコアと学部課程の GPA という 2 つの特徴量を使用して、学生が司法試験に合格するかどうかを予測するための分類器をトレーニングします。\n",
        "\n",
        "分類器のスコアによって法科大学院への進学適正または奨学金が誘導されると仮定しましょう。メリットに基づく社会規範に従うと、GPA と LSAT スコアがより高い学生が分類器からより高いスコアを得ることが期待されますが、 モデルがたやすくこういった直感的な規範を違反し、より高い GPA と LSAT スコアに対して学生にペナルティが科されることがあることが観察されます。\n",
        "\n",
        "この*不平等なペナリゼーション*問題を解決するために、単調性制約を課すことで、モデルがより高い GPA と LSAT スコアにペナルティを決して科すことなく、すべて平等に審査するようにすることができます。このチュートリアルでは、TFL を使って単調性制約を課す方法を説明します。"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "vJES8lYT1fHN"
      },
      "source": [
        "## 法科大学院データを読み込む"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Cl89ZOsQ14An"
      },
      "outputs": [],
      "source": [
        "# Load data file.\n",
        "law_file_name = 'lsac.csv'\n",
        "law_file_path = os.path.join(DATA_DIR, law_file_name)\n",
        "raw_law_df = pd.read_csv(law_file_path, delimiter=',')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RCpTYCNjqOsC"
      },
      "source": [
        "データセットを前処理します。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "jdY5rtLs4xQK"
      },
      "outputs": [],
      "source": [
        "# Define label column name.\n",
        "LAW_LABEL = 'pass_bar'"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "1t1Hd8gu6Uat"
      },
      "outputs": [],
      "source": [
        "def preprocess_law_data(input_df):\n",
        "  # Drop rows with where the label or features of interest are missing.\n",
        "  output_df = input_df[~input_df[LAW_LABEL].isna() & ~input_df['ugpa'].isna() &\n",
        "                       (input_df['ugpa'] > 0) & ~input_df['lsat'].isna()]\n",
        "  return output_df\n",
        "\n",
        "\n",
        "law_df = preprocess_law_data(raw_law_df)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YhvSKr9SCrHP"
      },
      "source": [
        "### データをトレーニング/検証/テストセットに分割する"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "gQKkIGD-CvGD"
      },
      "outputs": [],
      "source": [
        "def split_dataset(input_df, random_state=888):\n",
        "  \"\"\"Splits an input dataset into train, val, and test sets.\"\"\"\n",
        "  train_df, test_val_df = train_test_split(\n",
        "      input_df, test_size=0.3, random_state=random_state)\n",
        "  val_df, test_df = train_test_split(\n",
        "      test_val_df, test_size=0.66, random_state=random_state)\n",
        "  return train_df, val_df, test_df\n",
        "\n",
        "\n",
        "law_train_df, law_val_df, law_test_df = split_dataset(law_df)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "zObwzY7f3aLy"
      },
      "source": [
        "### データの分布を視覚化する\n",
        "\n",
        "まず、データの分布を視覚化します。司法試験に合格したすべての学生と合格しなかった学生の GPA と LSAT スコアを描画します。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "dRAZB5cLORUG"
      },
      "outputs": [],
      "source": [
        "def plot_dataset_contour(input_df, title):\n",
        "  plt.rcParams['font.family'] = ['serif']\n",
        "  g = sns.jointplot(\n",
        "      x='ugpa',\n",
        "      y='lsat',\n",
        "      data=input_df,\n",
        "      kind='kde',\n",
        "      xlim=[1.4, 4],\n",
        "      ylim=[0, 50])\n",
        "  g.plot_joint(plt.scatter, c='b', s=10, linewidth=1, marker='+')\n",
        "  g.ax_joint.collections[0].set_alpha(0)\n",
        "  g.set_axis_labels('Undergraduate GPA', 'LSAT score', fontsize=14)\n",
        "  g.fig.suptitle(title, fontsize=14)\n",
        "  # Adust plot so that the title fits.\n",
        "  plt.subplots_adjust(top=0.9)\n",
        "  plt.show()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "feovlsWPQhVG"
      },
      "outputs": [],
      "source": [
        "law_df_pos = law_df[law_df[LAW_LABEL] == 1]\n",
        "plot_dataset_contour(\n",
        "    law_df_pos, title='Distribution of students that passed the bar')"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ct-2tEedU0aO"
      },
      "outputs": [],
      "source": [
        "law_df_neg = law_df[law_df[LAW_LABEL] == 0]\n",
        "plot_dataset_contour(\n",
        "    law_df_neg, title='Distribution of students that failed the bar')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6grrFEMPfPjk"
      },
      "source": [
        "## 司法試験通過を予測する較正済み線形モデルをトレーニングする\n",
        "\n",
        "次に、学生が司法試験に合格するかどうかを予測するために、TFL の*較正済み線形モデル*をトレーニングします。LSAT スコアと学部課程の GPA を 2 つの特徴量として使用し、学生が司法試験に合格するかどうかをトレーニングラベルとします。\n",
        "\n",
        "まず制約を使用せずに、較正済み線形モデルをトレーニングします。次に、単調性制約を使用して較正済み線形モデルをトレーニングし、モデルの出力と精度に生じる違いを観察します。"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "vrUZvP8V736o"
      },
      "source": [
        "### TFL 較正済み線形 Estimator をトレーニングするためのヘルパー関数\n",
        "\n",
        "これらは、この法科大学院の事例と以下の債務不履行の事例に使用されます。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ollW4xAZ72kz"
      },
      "outputs": [],
      "source": [
        "def train_tfl_estimator(train_df, monotonicity, learning_rate, num_epochs,\n",
        "                        batch_size, get_input_fn,\n",
        "                        get_feature_columns_and_configs):\n",
        "  \"\"\"Trains a TFL calibrated linear estimator.\n",
        "\n",
        "  Args:\n",
        "    train_df: pandas dataframe containing training data.\n",
        "    monotonicity: if 0, then no monotonicity constraints. If 1, then all\n",
        "      features are constrained to be monotonically increasing.\n",
        "    learning_rate: learning rate of Adam optimizer for gradient descent.\n",
        "    num_epochs: number of training epochs.\n",
        "    batch_size: batch size for each epoch. None means the batch size is the full\n",
        "      dataset size.\n",
        "    get_input_fn: function that returns the input_fn for a TF estimator.\n",
        "    get_feature_columns_and_configs: function that returns TFL feature columns\n",
        "      and configs.\n",
        "\n",
        "  Returns:\n",
        "    estimator: a trained TFL calibrated linear estimator.\n",
        "\n",
        "  \"\"\"\n",
        "  feature_columns, feature_configs = get_feature_columns_and_configs(\n",
        "      monotonicity)\n",
        "\n",
        "  model_config = tfl.configs.CalibratedLinearConfig(\n",
        "      feature_configs=feature_configs, use_bias=False)\n",
        "\n",
        "  estimator = tfl.estimators.CannedClassifier(\n",
        "      feature_columns=feature_columns,\n",
        "      model_config=model_config,\n",
        "      feature_analysis_input_fn=get_input_fn(input_df=train_df, num_epochs=1),\n",
        "      optimizer=tf.keras.optimizers.Adam(learning_rate))\n",
        "\n",
        "  estimator.train(\n",
        "      input_fn=get_input_fn(\n",
        "          input_df=train_df, num_epochs=num_epochs, batch_size=batch_size))\n",
        "  return estimator\n",
        "\n",
        "\n",
        "def optimize_learning_rates(\n",
        "    train_df,\n",
        "    val_df,\n",
        "    test_df,\n",
        "    monotonicity,\n",
        "    learning_rates,\n",
        "    num_epochs,\n",
        "    batch_size,\n",
        "    get_input_fn,\n",
        "    get_feature_columns_and_configs,\n",
        "):\n",
        "  \"\"\"Optimizes learning rates for TFL estimators.\n",
        "\n",
        "  Args:\n",
        "    train_df: pandas dataframe containing training data.\n",
        "    val_df: pandas dataframe containing validation data.\n",
        "    test_df: pandas dataframe containing test data.\n",
        "    monotonicity: if 0, then no monotonicity constraints. If 1, then all\n",
        "      features are constrained to be monotonically increasing.\n",
        "    learning_rates: list of learning rates to try.\n",
        "    num_epochs: number of training epochs.\n",
        "    batch_size: batch size for each epoch. None means the batch size is the full\n",
        "      dataset size.\n",
        "    get_input_fn: function that returns the input_fn for a TF estimator.\n",
        "    get_feature_columns_and_configs: function that returns TFL feature columns\n",
        "      and configs.\n",
        "\n",
        "  Returns:\n",
        "    A single TFL estimator that achieved the best validation accuracy.\n",
        "  \"\"\"\n",
        "  estimators = []\n",
        "  train_accuracies = []\n",
        "  val_accuracies = []\n",
        "  test_accuracies = []\n",
        "  for lr in learning_rates:\n",
        "    estimator = train_tfl_estimator(\n",
        "        train_df=train_df,\n",
        "        monotonicity=monotonicity,\n",
        "        learning_rate=lr,\n",
        "        num_epochs=num_epochs,\n",
        "        batch_size=batch_size,\n",
        "        get_input_fn=get_input_fn,\n",
        "        get_feature_columns_and_configs=get_feature_columns_and_configs)\n",
        "    estimators.append(estimator)\n",
        "    train_acc = estimator.evaluate(\n",
        "        input_fn=get_input_fn(train_df, num_epochs=1))['accuracy']\n",
        "    val_acc = estimator.evaluate(\n",
        "        input_fn=get_input_fn(val_df, num_epochs=1))['accuracy']\n",
        "    test_acc = estimator.evaluate(\n",
        "        input_fn=get_input_fn(test_df, num_epochs=1))['accuracy']\n",
        "    print('accuracies for learning rate %f: train: %f, val: %f, test: %f' %\n",
        "          (lr, train_acc, val_acc, test_acc))\n",
        "    train_accuracies.append(train_acc)\n",
        "    val_accuracies.append(val_acc)\n",
        "    test_accuracies.append(test_acc)\n",
        "  max_index = val_accuracies.index(max(val_accuracies))\n",
        "  return estimators[max_index]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jeEfKSA7_aOg"
      },
      "source": [
        "### 法科大学院データセットの特徴量を構成するためのヘルパー関数\n",
        "\n",
        "これらのヘルパー関数は法科大学院の事例に特化した関数です。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "B6NU6EEKIYMJ"
      },
      "outputs": [],
      "source": [
        "def get_input_fn_law(input_df, num_epochs, batch_size=None):\n",
        "  \"\"\"Gets TF input_fn for law school models.\"\"\"\n",
        "  return tf.compat.v1.estimator.inputs.pandas_input_fn(\n",
        "      x=input_df[['ugpa', 'lsat']],\n",
        "      y=input_df['pass_bar'],\n",
        "      num_epochs=num_epochs,\n",
        "      batch_size=batch_size or len(input_df),\n",
        "      shuffle=False)\n",
        "\n",
        "\n",
        "def get_feature_columns_and_configs_law(monotonicity):\n",
        "  \"\"\"Gets TFL feature configs for law school models.\"\"\"\n",
        "  feature_columns = [\n",
        "      tf.feature_column.numeric_column('ugpa'),\n",
        "      tf.feature_column.numeric_column('lsat'),\n",
        "  ]\n",
        "  feature_configs = [\n",
        "      tfl.configs.FeatureConfig(\n",
        "          name='ugpa',\n",
        "          lattice_size=2,\n",
        "          pwl_calibration_num_keypoints=20,\n",
        "          monotonicity=monotonicity,\n",
        "          pwl_calibration_always_monotonic=False),\n",
        "      tfl.configs.FeatureConfig(\n",
        "          name='lsat',\n",
        "          lattice_size=2,\n",
        "          pwl_calibration_num_keypoints=20,\n",
        "          monotonicity=monotonicity,\n",
        "          pwl_calibration_always_monotonic=False),\n",
        "  ]\n",
        "  return feature_columns, feature_configs"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "HSfAwgiO_6YA"
      },
      "source": [
        "### トレーニング済みモデルの出力を視覚化するためのヘルパー関数"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "HESNIC5H-1dz"
      },
      "outputs": [],
      "source": [
        "def get_predicted_probabilities(estimator, input_df, get_input_fn):\n",
        "  predictions = estimator.predict(\n",
        "      input_fn=get_input_fn(input_df=input_df, num_epochs=1))\n",
        "  return [prediction['probabilities'][1] for prediction in predictions]\n",
        "\n",
        "\n",
        "def plot_model_contour(estimator, input_df, num_keypoints=20):\n",
        "  x = np.linspace(min(input_df['ugpa']), max(input_df['ugpa']), num_keypoints)\n",
        "  y = np.linspace(min(input_df['lsat']), max(input_df['lsat']), num_keypoints)\n",
        "\n",
        "  x_grid, y_grid = np.meshgrid(x, y)\n",
        "\n",
        "  positions = np.vstack([x_grid.ravel(), y_grid.ravel()])\n",
        "  plot_df = pd.DataFrame(positions.T, columns=['ugpa', 'lsat'])\n",
        "  plot_df[LAW_LABEL] = np.ones(len(plot_df))\n",
        "  predictions = get_predicted_probabilities(\n",
        "      estimator=estimator, input_df=plot_df, get_input_fn=get_input_fn_law)\n",
        "  grid_predictions = np.reshape(predictions, x_grid.shape)\n",
        "\n",
        "  plt.rcParams['font.family'] = ['serif']\n",
        "  plt.contour(\n",
        "      x_grid,\n",
        "      y_grid,\n",
        "      grid_predictions,\n",
        "      colors=('k',),\n",
        "      levels=np.linspace(0, 1, 11))\n",
        "  plt.contourf(\n",
        "      x_grid,\n",
        "      y_grid,\n",
        "      grid_predictions,\n",
        "      cmap=plt.cm.bone,\n",
        "      levels=np.linspace(0, 1, 11))  # levels=np.linspace(0,1,8));\n",
        "  plt.xticks(fontsize=20)\n",
        "  plt.yticks(fontsize=20)\n",
        "\n",
        "  cbar = plt.colorbar()\n",
        "  cbar.ax.set_ylabel('Model score', fontsize=20)\n",
        "  cbar.ax.tick_params(labelsize=20)\n",
        "\n",
        "  plt.xlabel('Undergraduate GPA', fontsize=20)\n",
        "  plt.ylabel('LSAT score', fontsize=20)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "fAMSCaRHIn1w"
      },
      "source": [
        "## 制約なし（非単調性）の較正済み線形モデルをトレーニングする"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Iff8omH3Ij_x"
      },
      "outputs": [],
      "source": [
        "nomon_linear_estimator = optimize_learning_rates(\n",
        "    train_df=law_train_df,\n",
        "    val_df=law_val_df,\n",
        "    test_df=law_test_df,\n",
        "    monotonicity=0,\n",
        "    learning_rates=LEARNING_RATES,\n",
        "    batch_size=BATCH_SIZE,\n",
        "    num_epochs=NUM_EPOCHS,\n",
        "    get_input_fn=get_input_fn_law,\n",
        "    get_feature_columns_and_configs=get_feature_columns_and_configs_law)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Gxfv8hXMh4_E"
      },
      "outputs": [],
      "source": [
        "plot_model_contour(nomon_linear_estimator, input_df=law_df)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "eKVkjHg_LaWb"
      },
      "source": [
        "## 単調な較正済み線形モデルをトレーニングする"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "C_MUEvGNp6g2"
      },
      "outputs": [],
      "source": [
        "mon_linear_estimator = optimize_learning_rates(\n",
        "    train_df=law_train_df,\n",
        "    val_df=law_val_df,\n",
        "    test_df=law_test_df,\n",
        "    monotonicity=1,\n",
        "    learning_rates=LEARNING_RATES,\n",
        "    batch_size=BATCH_SIZE,\n",
        "    num_epochs=NUM_EPOCHS,\n",
        "    get_input_fn=get_input_fn_law,\n",
        "    get_feature_columns_and_configs=get_feature_columns_and_configs_law)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ABdhYOUVCXzD"
      },
      "outputs": [],
      "source": [
        "plot_model_contour(mon_linear_estimator, input_df=law_df)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "fsI14lrFxRha"
      },
      "source": [
        "## ほかの制約なしモデルをトレーニングする\n",
        "\n",
        "TFL の較正済み線形モデルを、精度をそれほど大きく犠牲にすることなく LSAT スコアと GPA の両方で単調になるようにトレーニングできることを実演しました。\n",
        "\n",
        "しかし、較正済み線形モデルは、ディープニューラルネットワーク（DNN）や勾配ブースティング木（GBT）などのほかの種類のモデルとどのように比較されるでしょうか。DNN と GBT は合理的に平等な出力を持つように見えるのでしょうか。この疑問への答えをえるために、制約なしの DNN と GBT をトレーニングすることにします。実際のところ、DNN と GBT は、LSAT スコアと学部課程の GPA の単調性を大きく侵害することが観察されるでしょう。"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "uo1ruWXcvUqb"
      },
      "source": [
        "### 制約なしのディープニューラルネットワーク（DNN）モデルをトレーニングする\n",
        "\n",
        "アーキテクチャは、高い検証精度を得られるように、最適化済みです。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "3pplraob0Od-"
      },
      "outputs": [],
      "source": [
        "feature_names = ['ugpa', 'lsat']\n",
        "\n",
        "dnn_estimator = tf.estimator.DNNClassifier(\n",
        "    feature_columns=[\n",
        "        tf.feature_column.numeric_column(feature) for feature in feature_names\n",
        "    ],\n",
        "    hidden_units=[100, 100],\n",
        "    optimizer=tf.keras.optimizers.Adam(learning_rate=0.008),\n",
        "    activation_fn=tf.nn.relu)\n",
        "\n",
        "dnn_estimator.train(\n",
        "    input_fn=get_input_fn_law(\n",
        "        law_train_df, batch_size=BATCH_SIZE, num_epochs=NUM_EPOCHS))\n",
        "dnn_train_acc = dnn_estimator.evaluate(\n",
        "    input_fn=get_input_fn_law(law_train_df, num_epochs=1))['accuracy']\n",
        "dnn_val_acc = dnn_estimator.evaluate(\n",
        "    input_fn=get_input_fn_law(law_val_df, num_epochs=1))['accuracy']\n",
        "dnn_test_acc = dnn_estimator.evaluate(\n",
        "    input_fn=get_input_fn_law(law_test_df, num_epochs=1))['accuracy']\n",
        "print('accuracies for DNN: train: %f, val: %f, test: %f' %\n",
        "      (dnn_train_acc, dnn_val_acc, dnn_test_acc))"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "LwPQqLt-E7R4"
      },
      "outputs": [],
      "source": [
        "plot_model_contour(dnn_estimator, input_df=law_df)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "OOAKK0_3vWir"
      },
      "source": [
        "### 制約なしの勾配ブースティング木（GBT）モデルをトレーニングする\n",
        "\n",
        "木の構造は、高い検証精度を得られるように、最適化済みです。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "mFaI9hB-rgoL"
      },
      "outputs": [],
      "source": [
        "tree_estimator = tf.estimator.BoostedTreesClassifier(\n",
        "    feature_columns=[\n",
        "        tf.feature_column.numeric_column(feature) for feature in feature_names\n",
        "    ],\n",
        "    n_batches_per_layer=2,\n",
        "    n_trees=20,\n",
        "    max_depth=4)\n",
        "\n",
        "tree_estimator.train(\n",
        "    input_fn=get_input_fn_law(\n",
        "        law_train_df, num_epochs=NUM_EPOCHS, batch_size=BATCH_SIZE))\n",
        "tree_train_acc = tree_estimator.evaluate(\n",
        "    input_fn=get_input_fn_law(law_train_df, num_epochs=1))['accuracy']\n",
        "tree_val_acc = tree_estimator.evaluate(\n",
        "    input_fn=get_input_fn_law(law_val_df, num_epochs=1))['accuracy']\n",
        "tree_test_acc = tree_estimator.evaluate(\n",
        "    input_fn=get_input_fn_law(law_test_df, num_epochs=1))['accuracy']\n",
        "print('accuracies for GBT: train: %f, val: %f, test: %f' %\n",
        "      (tree_train_acc, tree_val_acc, tree_test_acc))"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "AZFyfQT1E_nR"
      },
      "outputs": [],
      "source": [
        "plot_model_contour(tree_estimator, input_df=law_df)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "uX2qiMlrY8aO"
      },
      "source": [
        "# 事例 2: 債務不履行\n",
        "\n",
        "このチュートリアルで考察する 2 つ目の事例は、個人の債務不履行確率の予測です。UCI リポジトリの Default of Credit Card Clients データセットを使用します。このデータは、台湾の 30,000 人のクレジットカード利用者から集められたもので、ある期間に利用者が債務不履行となるかどうかを示すバイナリラベルが含まれます。特徴量には、2005 年 4 月から 9 月までの毎月の婚姻状況、性別、学歴、および利用者の既存の支払いの延滞期間が含まれます。\n",
        "\n",
        "最初の事例と同様に、*不平等なペナリゼーション*を回避するために単調性制約を使用して説明します。モデルが利用者の信用スコアを決定するために使用される場合、ほかのすべてが同等である場合に請求書を早期に支払うことに対してペナルティが科されるのであれば、利用者は不平等だと感じることでしょう。そのため、モデルが早期払いにペナルティを科さないように、単調性制約を適用します。"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "tz5yduNuFinA"
      },
      "source": [
        "## 債務不履行データを読み込む"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "KuylMNBCILwy"
      },
      "outputs": [],
      "source": [
        "# Load data file.\n",
        "credit_file_name = 'credit_default.csv'\n",
        "credit_file_path = os.path.join(DATA_DIR, credit_file_name)\n",
        "credit_df = pd.read_csv(credit_file_path, delimiter=',')"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Hv_GQcEHIf9v"
      },
      "outputs": [],
      "source": [
        "# Define label column name.\n",
        "CREDIT_LABEL = 'default'"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "13oZWY0YIoy3"
      },
      "source": [
        "### データをトレーニング/検証/テストセットに分割する"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "dty5tXJqIscz"
      },
      "outputs": [],
      "source": [
        "credit_train_df, credit_val_df, credit_test_df = split_dataset(credit_df)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_kAciWXHKGV7"
      },
      "source": [
        "### データの分布を視覚化する\n",
        "\n",
        "まず、データの分布を視覚化します。さまざまな婚姻状況と支払い状況の利用者に確認される不履行率の平均と標準偏差を描画します。支払い状況は、利用者のローン返済の月数で表されます（2005 年 4 月時点）。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "8CxacQxnkHWE"
      },
      "outputs": [],
      "source": [
        "def get_agg_data(df, x_col, y_col, bins=11):\n",
        "  xbins = pd.cut(df[x_col], bins=bins)\n",
        "  data = df[[x_col, y_col]].groupby(xbins).agg(['mean', 'sem'])\n",
        "  return data\n",
        "\n",
        "\n",
        "def plot_2d_means_credit(input_df, x_col, y_col, x_label, y_label):\n",
        "  plt.rcParams['font.family'] = ['serif']\n",
        "  _, ax = plt.subplots(nrows=1, ncols=1)\n",
        "  plt.setp(ax.spines.values(), color='black', linewidth=1)\n",
        "  ax.tick_params(\n",
        "      direction='in', length=6, width=1, top=False, right=False, labelsize=18)\n",
        "  df_single = get_agg_data(input_df[input_df['MARRIAGE'] == 1], x_col, y_col)\n",
        "  df_married = get_agg_data(input_df[input_df['MARRIAGE'] == 2], x_col, y_col)\n",
        "  ax.errorbar(\n",
        "      df_single[(x_col, 'mean')],\n",
        "      df_single[(y_col, 'mean')],\n",
        "      xerr=df_single[(x_col, 'sem')],\n",
        "      yerr=df_single[(y_col, 'sem')],\n",
        "      color='orange',\n",
        "      marker='s',\n",
        "      capsize=3,\n",
        "      capthick=1,\n",
        "      label='Single',\n",
        "      markersize=10,\n",
        "      linestyle='')\n",
        "  ax.errorbar(\n",
        "      df_married[(x_col, 'mean')],\n",
        "      df_married[(y_col, 'mean')],\n",
        "      xerr=df_married[(x_col, 'sem')],\n",
        "      yerr=df_married[(y_col, 'sem')],\n",
        "      color='b',\n",
        "      marker='^',\n",
        "      capsize=3,\n",
        "      capthick=1,\n",
        "      label='Married',\n",
        "      markersize=10,\n",
        "      linestyle='')\n",
        "  leg = ax.legend(loc='upper left', fontsize=18, frameon=True, numpoints=1)\n",
        "  ax.set_xlabel(x_label, fontsize=18)\n",
        "  ax.set_ylabel(y_label, fontsize=18)\n",
        "  ax.set_ylim(0, 1.1)\n",
        "  ax.set_xlim(-2, 8.5)\n",
        "  ax.patch.set_facecolor('white')\n",
        "  leg.get_frame().set_edgecolor('black')\n",
        "  leg.get_frame().set_facecolor('white')\n",
        "  leg.get_frame().set_linewidth(1)\n",
        "  plt.show()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "VHXyYbyekKLT"
      },
      "outputs": [],
      "source": [
        "plot_2d_means_credit(credit_train_df, 'PAY_0', 'default',\n",
        "                     'Repayment Status (April)', 'Observed default rate')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "4hnZBigB7kzY"
      },
      "source": [
        "## 債務不履行率を予測する較正済み線形モデルをトレーニングする\n",
        "\n",
        "次に、利用者がローンの債務不履行となるかどうかを予測するために、TFL の*較正済み線形モデル*をトレーニングします。利用者の 4 月の婚姻状況とローン返済の延滞月数（返済状況）を 2 つの特徴量として使用します。トレーニングラベルは、利用者がローンの債務不履行になったかどうかを示します。\n",
        "\n",
        "まず制約を使用せずに、較正済み線形モデルをトレーニングします。次に、単調性制約を使用して較正済み線形モデルをトレーニングし、モデルの出力と精度に生じる違いを観察します。"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "UEcHW1u3Jk_2"
      },
      "source": [
        "### 債務不履行データセットの特徴量を構成するためのヘルパー関数\n",
        "\n",
        "これらのヘルパー関数は債務不履行の事例に特化した関数です。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "QBa-hczLi7DM"
      },
      "outputs": [],
      "source": [
        "def get_input_fn_credit(input_df, num_epochs, batch_size=None):\n",
        "  \"\"\"Gets TF input_fn for credit default models.\"\"\"\n",
        "  return tf.compat.v1.estimator.inputs.pandas_input_fn(\n",
        "      x=input_df[['MARRIAGE', 'PAY_0']],\n",
        "      y=input_df['default'],\n",
        "      num_epochs=num_epochs,\n",
        "      batch_size=batch_size or len(input_df),\n",
        "      shuffle=False)\n",
        "\n",
        "\n",
        "def get_feature_columns_and_configs_credit(monotonicity):\n",
        "  \"\"\"Gets TFL feature configs for credit default models.\"\"\"\n",
        "  feature_columns = [\n",
        "      tf.feature_column.numeric_column('MARRIAGE'),\n",
        "      tf.feature_column.numeric_column('PAY_0'),\n",
        "  ]\n",
        "  feature_configs = [\n",
        "      tfl.configs.FeatureConfig(\n",
        "          name='MARRIAGE',\n",
        "          lattice_size=2,\n",
        "          pwl_calibration_num_keypoints=3,\n",
        "          monotonicity=monotonicity,\n",
        "          pwl_calibration_always_monotonic=False),\n",
        "      tfl.configs.FeatureConfig(\n",
        "          name='PAY_0',\n",
        "          lattice_size=2,\n",
        "          pwl_calibration_num_keypoints=10,\n",
        "          monotonicity=monotonicity,\n",
        "          pwl_calibration_always_monotonic=False),\n",
        "  ]\n",
        "  return feature_columns, feature_configs"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "iwxnlRrQPdTg"
      },
      "source": [
        "### トレーニング済みモデルの出力を視覚化するためのヘルパー関数"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "zVGxEfbhPZ5H"
      },
      "outputs": [],
      "source": [
        "def plot_predictions_credit(input_df,\n",
        "                            estimator,\n",
        "                            x_col,\n",
        "                            x_label='Repayment Status (April)',\n",
        "                            y_label='Predicted default probability'):\n",
        "  predictions = get_predicted_probabilities(\n",
        "      estimator=estimator, input_df=input_df, get_input_fn=get_input_fn_credit)\n",
        "  new_df = input_df.copy()\n",
        "  new_df.loc[:, 'predictions'] = predictions\n",
        "  plot_2d_means_credit(new_df, x_col, 'predictions', x_label, y_label)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "UMIpywE1P07H"
      },
      "source": [
        "## 制約なし（非単調性）の較正済み線形モデルをトレーニングする"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "FfXUKns5cPEw"
      },
      "outputs": [],
      "source": [
        "nomon_linear_estimator = optimize_learning_rates(\n",
        "    train_df=credit_train_df,\n",
        "    val_df=credit_val_df,\n",
        "    test_df=credit_test_df,\n",
        "    monotonicity=0,\n",
        "    learning_rates=LEARNING_RATES,\n",
        "    batch_size=BATCH_SIZE,\n",
        "    num_epochs=NUM_EPOCHS,\n",
        "    get_input_fn=get_input_fn_credit,\n",
        "    get_feature_columns_and_configs=get_feature_columns_and_configs_credit)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "5zQ_jm75kRX6"
      },
      "outputs": [],
      "source": [
        "plot_predictions_credit(credit_train_df, nomon_linear_estimator, 'PAY_0')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0aokp7qLQBIr"
      },
      "source": [
        "## 単調な較正済み線形モデルをトレーニングする"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "wWCG7YrLUZDH"
      },
      "outputs": [],
      "source": [
        "mon_linear_estimator = optimize_learning_rates(\n",
        "    train_df=credit_train_df,\n",
        "    val_df=credit_val_df,\n",
        "    test_df=credit_test_df,\n",
        "    monotonicity=1,\n",
        "    learning_rates=LEARNING_RATES,\n",
        "    batch_size=BATCH_SIZE,\n",
        "    num_epochs=NUM_EPOCHS,\n",
        "    get_input_fn=get_input_fn_credit,\n",
        "    get_feature_columns_and_configs=get_feature_columns_and_configs_credit)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "JCQ2eMdndFhR"
      },
      "outputs": [],
      "source": [
        "plot_predictions_credit(credit_train_df, mon_linear_estimator, 'PAY_0')"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [],
      "name": "shape_constraints_for_ethics.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
